Search | WHO COVID-19 Research Database

Simple TICO-19: A Dataset for Joint Translation and Simplification of COVID-19 Texts

Shardlow, M.; Alva-Manchego, F..

Lrec 2022: Thirteen International Conference on Language Resources and Evaluation ; : 3093-3102, 2022.

Article in English | Web of Science | ID: covidwho-2310924

ABSTRACT

Specialist high-quality information is typically first available in English, and it is written in a language that may be difficult to understand by most readers. While Machine Translation technologies contribute to mitigate the first issue, the translated content will most likely still contain complex language. In order to investigate and address both problems simultaneously, we introduce Simple TICO-19, a new language resource containing manual simplifications of the English and Spanish portions of the TICO-19 corpus for Machine Translation of COVID-19 literature. We provide an in-depth description of the annotation process, which entailed designing an annotation manual and employing four annotators (two native English speakers and two native Spanish speakers) who simplified over 6,000 sentences from the English and Spanish portions of the TICO-19 corpus. We report several statistics on the new dataset, focusing on analysing the improvements in readability from the original texts to their simplified versions. In addition, we propose baseline methodologies for automatically generating the simplifications, translations and joint translation and simplifications contained in our dataset.

Towards Readability-Controlled Machine Translation of COVID-19 Texts

Alva-Manchego, F.; Shardlow, M..

23rd Annual Conference of the European Association for Machine Translation, EAMT 2022 ; : 287-288, 2022.

Article in English | Scopus | ID: covidwho-2044862

ABSTRACT

This project investigates the capabilities of machine translation (MT) models for generating translations at varying levels of readability, focusing on texts about COVID-19. Funded by the European Association for Machine Translation and by the Centre for Advanced Computational Sciences at Manchester Metropolitan University, we collected manual simplifications for English and Spanish texts in the TICO-19 dataset, and assessed the performance of neural MT models in this new benchmark. Future work will implement models that jointly translate and simplify, and develop suitable evaluation metrics. © 2022 The authors.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL